Whole-Genome k-mer Topic Modeling Associates Bacterial Families
نویسندگان
چکیده
منابع مشابه
Bacterial population assay via k-mer analysis
Identifying and assaying the relative abundance of members of complex microbial communities is an important problem in ecology. Sandberg et al. investigated the usage of genomic signatures to provide high identification percentages from short sequence samples. In this paper we present an improved naive Bayesian classification method using conditional probabilities, which can be used to classify...
متن کاملSparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly
Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based algorithms, upon which some of the most widely used de novo genome assemblers have been built, we released SparseAssembler1. SparseAssembler1 can save as much as 90% memory consumption in comparison with the state-of-art assemblers, but it requires rounds of denoising to accurately assemble genomes. Alg...
متن کاملInformed and automated k-mer size selection for genome assembly
MOTIVATION Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision. RESULTS...
متن کاملMixed Modeling with Whole Genome Data
Objective. We consider the need for a modeling framework for related individuals and various sources of variations. The relationships could either be among relatives in families or among unrelated individuals in a general population with cryptic relatedness; both could be refined or derived with whole genome data. As with variations they can include oliogogenes, polygenes, single nucleotide pol...
متن کاملSEK: sparsity exploiting k-mer-based estimation of bacterial community composition
MOTIVATION Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Genes
سال: 2020
ISSN: 2073-4425
DOI: 10.3390/genes11020197